-
Notifications
You must be signed in to change notification settings - Fork 49
Add a Stdin Specification #586
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@brianmcgee @zimbatm, is there anything I can do to help move this forward? |
|
It's unclear to me if this documents the current behaviour or the desired behaviour. If it's the desired behaviour, isn't there some work needing to be done in |
Yes, treefmt needs changes to support this. I'm happy to implement that (in this PR or a separate one). I just don't want to put in the effort until I know we like this approach. |
|
I'll try to put my brain on this in the next few days. Been a busy few weeks 😫 |
|
@brianmcgee, friendly ping |
brianmcgee
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with some of your own musings in the comments you'd left. Otherwise, it seems like a sane spec and one which should be easy enough for formatters to follow.
I also like that this is opt-in.
|
P.S. Thanks for taking the time to flesh this out, and sorry for taking a while to get around to reviewing this. |
8e6e1cb to
5671029
Compare
5671029 to
d8276a6
Compare
infinisil
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
|
@brianmcgee @zimbatm, I could use another review on this. If we're happy with the state of this, then I'd like to work on the underlying implementation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor things, but directionally it makes sense.
| formatting "virtual files" passed via stdin. | ||
|
|
||
| A formatter **MUST** implement the Stdin Specification if its formatting behavior | ||
| can depend on the name of the file being formatted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a bit hard to read
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, but I don't know how to fix it. Does flipping the sentence around help? Suggestions appreciated.
If a formatter's behavior can depend on the name of the file being formatted, then it MUST implement the Stdin Specification.
| "stdin mode". In stdin mode, the formatter reads file contents from stdin | ||
| rather than the filesystem. | ||
| - The formatter _MAY_ understand `--stdin-filepath=<path>` as well, but **MUST** | ||
| understand the space separated variant. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would make sense to outline the purpose of the <path> argument explicitly. Both you and I know why it's there, but maybe not the reader.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the path argument is not needed by the tool, it's also valid to ignore it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've taken a stab at this. Please take a look.
| vanilla [Formatter Specification](/reference/formatter-spec), and additionally | ||
| satisfy the following: | ||
|
|
||
| ### 1. `--stdin-filepath` flag |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ### 1. `--stdin-filepath` flag | |
| ### 1. `--stdin` flag |
KISS?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm lightly opposed to this, for 2 reasons:
- I think
--stdin-filepathis clearer that the option takes a value. treefmtitself currently implements a--stdinflag with different semantics, and I would like fortreefmtto implement the new stdin spec. Changing this might be awkward?
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jfly FWIW, that sounds reasonable to me (I agree with and I would like for treefmt to implement the new stdin spec).
| Formatters **MAY** also implement the Stdin Specification, which allows | ||
| formatting "virtual files" passed via stdin. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this support expressed in the treefmt.toml config?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking we'd add a new formatter.<mylanguage>.stdin (or stdin_supported) bool that defaults to false if unspecified. Feedback appreciated.
Do you think the stdin spec is the place to put this? I was thinking this would go in our config docs.
This completes step 1 of numtide#573 (comment).
d8276a6 to
a94c619
Compare
|
Any remaining work to be done here, or is this just waiting for review? I'd be happy to lend a hand, if there's any way I can help. |
|
@cstrahan I think this is ready for implementation. I'm pretty swamped right now. If you're up for putting together an implementation, I'd be happy to code review! |
| - `--stdin-filepath <path>` is an optional flag that puts the formatter in | ||
| "stdin mode". In stdin mode, the formatter reads file contents from stdin | ||
| rather than the filesystem. | ||
| - The formatter _MAY_ alter its behavior based on the given `<path>`. For | ||
| example, if a there are different formatting rules in different | ||
| directories. If the formatter's behavior doesn't depend on the given | ||
| `<path>`, it's ok to ignore it. | ||
| - The formatter _MAY_ understand `--stdin-filepath=<path>` as well, but **MUST** | ||
| understand the space separated variant. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are tools out there whose cmdline parsers just do not support double dash flags with values. For example:
https://pkg.go.dev/flag#hdr-Command_line_flag_syntax
According to that documentation, any go-based formatter cannot support this without swapping away from the standard library's flags package. Treefmt itself is saved by using cobra.
Here are a few examples of stdin mode from a repo of mine. I count 5 6 7 unique ways of specifying this:
- detects stdin, doesn't need a filepath:
rustfmt,gofmt,alejandra --quiet - requires a - path:
ruff format --silent -,mdformat -,tofu fmt - buildifier -path=$path -taplo format - --stdin-filepath=$pathprettier --stdin-filepath=$pathbiome format --write --stdin-file-path $path- An old internal tool using
--stdin-filename=...
Given this extensive variation, I believe the structure of the command line flags should not be specified AT ALL. You should just let the user describe what the tool accepts. The way jj fix configures this is quite good: it substitutes $path in command args like --stdin-filepath=$path (coincidentally every tool must run in stdin mode). Making treefmt work with everything it possibly can is more important than making a nice spec. I dread having to make PRs to 15-year-old barely-maintained build tools to make them support the spec, or writing endless formatting wrapper scripts. Basically please don't make me do that.
If the tool does support stdin mode, I don't see why you would ever want to run it in non-stdin mode. For example, rustfmt has both modes, but rustfmt file.rs does a filesystem walk from there for every mod statement it discovers. That's got to be bad for performance if treefmt is scheduling one rustfmt per file, since a file is formatted N times if it's N modules deep in a tree! (Edit: Unless you're running multiple files per formatter instance, in which case rustfmt does not duplicate work and will keep its code in cache better, but you still may not want this if some other formatter is single threaded.)
Therefore, I think you should just add a toml option saying "this tool runs in stdin mode", or perhaps just an alternate set of command line flags to use in stdin mode. Stdin mode is preferable in many cases, for example you can coordinate 5 formatter processes and pipe their stdin all the way through without buffering after every formatter. So many tools support it already with so many different flags that it's worth being flexible so this can be useful without waiting 2 years for a bunch of CLI flag adjustments to go through.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to that documentation, any go-based formatter cannot support this without swapping away from the standard library's flags package
Sorry, I don't follow this point. From the page you linked to, I see this:
The following forms are permitted:
--flag // double dashes are also permitted
What am I missing?
Basically please don't make me do that. [...] it's worth being flexible so this can be useful without waiting 2 years for a bunch of CLI flag adjustments to go through.
I hear you. We will definitely need a compatibility story, but it's TBD if that lives in treefmt itself, or treefmt-nix.
If the tool does support stdin mode, I don't see why you would ever want to run it in non-stdin mode. [...] Edit: Unless you're running multiple files per formatter instance, in which case rustfmt does not duplicate work and will keep its code in cache better
This is exactly why: invoking the formatter N / B times (where B is our batch size) rather than N times.
you still may not want this if some other formatter is single threaded
This is a good question. We probably kind of assume that formatters are single threaded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, it seems the docs page for go flags is just terrible. It says "the following forms are permitted" and enumerates -flag and --flag as if to be exhaustive, but doesn't list --flag=value or --flag value. These do in fact work, as I just discovered. Notoriously posix getopt does not support that.
Compatibility story... TBD if that lives in treefmt itself, or treefmt-nix.
Please put it in treefmt. I don't want to use treefmt-nix. Sorry but I'm not learning yet another config schema, especially an opinionated one. And presuming you mean treefmt-nix would contain wrappers for all the well-known formatters to force them to fit the spec, one of the worst parts of Nix generally is that e.g. clang is buried about 40 layers deep under hacky bash wrappers each made of 6 layers of @@TEMPLATING_SYNTAX@@ and you never know what flags are actually being passed to the thing. Aside from making treefmt difficult to use for anything not covered by treefmt-nix, I don't know why you'd want to perpetuate that or take on that workload.
For everyday difficulty, try this example... let's say you make a typo over and over again. You have a code formatter to replace that specific typo. (This is legitimately very useful with jj fix for renaming a type or rewriting code across 80 commits at once, including code that was introduced and then replaced in a later commit.) You use sd to do it:
[formatter.cameron]
command = "sd"
options = ["Totanic", "Titanic"]
includes = ["*.md"]
This is instantly not compatible with the stdin spec. Basic things like this should be silently using tempfiles and not bothering me about spec compliance.
The batching etc
I think best solution is have a batch mode and a stdin mode. If no stdin mode is specified we use a temp file for stdin tasks. If no regular (batch mode) options are specified we use stdin when batch mode is called for. Bikeshed the options names but basically
[formatter.rust]
command = "rustfmt"
options = ["--edition", "2018"]
stdin_options = ["--edition", "2018", "--stdin-filepath=$path"]
includes = ["*.rs"]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You use
sdto do it:[formatter.cameron] command = "sd" options = ["Totanic", "Titanic"] includes = ["*.md"]This is instantly not compatible with the stdin spec.
I don't see the problem here. You are correct that this tool does not implement the stdin spec, but what's the issue? It's optional to implement. From the proposed spec:
Formatters MAY also implement the Stdin Specification [...] A formatter MUST implement the Stdin Specification if its formatting behavior can depend on the name of the file being formatted.
I think best solution is have a batch mode and a stdin mode
I like your options and stdin_options suggestion here!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think my point is it's optional, but if you don't implement it, what does treefmt --stdin scene-01.md do? And if your answer is that treefmt uses a tempfile because it doesn't support the spec, how do we know that? And how does treefmt decide to use --stdin-filepath=file.rs for something that does support the spec?
Realistically if we follow this to its conclusion, we get to either "we always pass --stdin-filepath=... and if the tool does not support it, that's their fault", or "we sometimes pass it, depending on a config flag for each formatter". Either option has a dead zone of formatters that don't support the spec when running treefmt --stdin which can't be supported except with a wrapper script (which, notably, you can't write easily in bash, because of getopt not supporting double dash flags).
The stdin_options approach does not have a dead zone and only comes at the cost of repeating yourself slightly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And if your answer is that treefmt uses a tempfile because it doesn't support the spec, how do we know that?
I'm sorry, but I don't see the problem here. This is something treefmt already supports, and I'm not proposing changing that. It uses a tempfile under the hood because how else could it possibly work?
And how does treefmt decide to use
--stdin-filepath=file.rsfor something that does support the spec?
We could add a boolean for stdin_spec, but I hear you: at the point that we're implementing something like stdin_options, it's probably simpler to just be opinionated unopinionated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It uses a tempfile under the hood because how else could it possibly work?
I asked only to prompt the necessity of some kind of config, instead of unconditionally assuming it supports the spec and adding --stdin-filepath=.... I think we understand each other now.
This completes step 1 of
#573 (comment).